Terminating Reliable Broadcast

Terminating Reliable Broadcast (TRB) is a problem in distributed computing that encapsulates the task of broadcasting a message to a set of receiving processes in the presence of faults.[1] In particular, the sender and any other process might fail ("crash") at any time.

Contents

Problem Description

A TRB protocol typically organizes the system into a sending process and a set of receiving processes, which may include the sender itself. A process is called "correct" if it does not fail at any point during its execution. The goal of the protocol is to transfer data (the "message") from the sender to the set of receiving processes. A process may perform many I/O operations during protocol execution, but eventually "delivers" a message by passing it to the application on that process that invoked the TRB protocol.

The protocol must provide important guarantees to the receiving processes. All correct receiving processes, for example, must deliver the sender's message if the sender is also correct. A receiving process may deliver a special message, \mathrm{SF} ("sender faulty"), if the sender failed, but either all correct processes will deliver \mathrm{SF} or none will. A correct process is therefore guaranteed that data delivered to it was also delivered to all other correct processes.

More precisely, a TRB protocol must satisfy the four formal properties below.

The presence of faults in the system makes these properties more difficult to satisfy. A simple but invalid TRB protocol might have the sender broadcast the message to all processes, and have receiving processes deliver the message as soon as it is received. This protocol, however, does not satisfy Agreement if faults can occur: if the sender crashes after sending the message to some processes, but before sending it to others, then the first set of processes may deliver the message while the second set delivers \mathrm{SF}.

Important TRB Protocols

Context in Distributed Computing

TRB is closely related, but not identical, to the fundamental distributed computing problem of Consensus.

References

  1. ^ Alvisi, Lorenzo (2006). "Consensus and Reliable Broadcast". http://www.cs.utexas.edu/users/lorenzo/corsi/cs371d/08F/notes/week8.pdf. Retrieved 2006-05-21.